WORLD INTELLECTUAL PROPERTY ORGaIHTtION 

International Bureau 



Jt 



PCT 

PJTERNATIONAL APPLICATION PUBUSHED ^ PATENT COOPER ATE TREATY (PCD 
International Paton* rioeoiK™*: fi . \**~*J 




(51) International Patent Classification 6 : 

C12N 15/00, C12P 21/00, A01H 1/04 



Al 



(11) International Publication Number: 
(43) International Publication Date: 



(21) International Application Number: PCT/US95/01495 

(22) International Filing Date: 3 February 1995 (03.02.95) 



(30) Priority Data: 

08/192,152 



3 February 1994 (03.02.94) US 



(71> API i , S : Kr' n ?^ CRIPPS RESEARCH INSTITUTE [US/US] 
10666 North Torrey Pines Road, La Jolla, CA 92037 (US). 

(72) Inventors: BEACHY, Roger, N, 751 Caminito Bassano La 
Jolla, CA 92037 (US). MARCOS, Jose. F, 4249 NobeJ 
Dnve, No. 28, San Digeo, CA 92122 (US). 

(74) Agents: BOSTICH. June, M. et al.; Spensley Horn Jubas & 

^•(us) ^ Park East> 5th floor - 1x55 Ange,es ' CA 



WO 95/21249 

10 August 1995 (10.08.95) 



(81) Designated Stetes: AL L CA^FI. JP, NO, European patent (AT, 

pt'se).' ' ' GR ' ^ LU - MC - NL - 



Pubushed 

With international search report. 



TklC: ioZS^Si T ° ACCUMULAT ^ MULTI PLE PROTEINS THROUGH SYNTHESIS OF A SELF-PROCESSING 
(57) Abstract 

inclusi^^ f C r eS ta ™» - d upon the nuc.ear 

a.so .dudes the protease which releases ^X^^^^y 5%££&*>££ ** * 



it' 



BNSOOCID: <WO_9521249A1 I > 



9 

i 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCT on the front pages of pamphlets publishing international 
applications under the PCT. 



AT Austria 

AU Australia 

BB Barbados 

BE Belgium 

BF Burkina Faso 

BG Bulgaria 

BJ Benin 

BR Brazil 

BY Belarus 

CA Canada 

CF Central African Republic 

CG Congo 

CH Switzerland 

CI Cdte d'tvoire 

CM Cameroon 

CN China 

CS Czechoslovakia 

CZ Czech Republic 

DE Germany 

DK Denmark 

ES Spain 

Fl Finland 

FR France 

GA Gabon 



GB United Kingdom 

GE Georgia 

GN Guinea 

GR Greece 

HU Hungary 

IE Ireland 

IT Italy 

Jp Japan 

KE Kenya 

KG Kyrgystan 

KP Democratic People's Republic 
of Korea 

KR Republic of Korea 

KZ Kazakhstan 

LI Liechtenstein 

LK Sri Lanka 

LU Luxembourg 

LV Latvia 

MC Monaco 

MD Republic of Moldova 

MG Madagascar 

ML Mali 

MN Mongolia 



MR 


Mauritania 


MW 


Malawi 


NE 


Niger 


NL 


Netherlands 


NO 


Norway 


NZ 


New Zealand 


PL 


Poland 


PT 


Portugal 


RO 


Romania 


RU 


Russian Federation 


SD 


Sudan 


SE 


Sweden 


SI 


Slovenia 


SK 


Slovakia 


SN 


Senegal 


TD 


Chad 


TG 


Togo 


TJ 


Tajikistan 


TT 


Trinidad and Tobago 


UA 


Ukraine 


US 


United Stales of America 


uz 


Uzbekistan 


VN 


Viet Nam 



■ <WO 9521249A1 _l^> 



WO 95/21249 _ PCT/US95/01495 



A CASSETTE TO ACCUM ULATE Mm TIPLE PROT EINS THROITQH 
SYNTHESIS OF A S ELF-PROCFSSINfi POT VPFPTinir 

This invention was made with government support under Grant Nos. ROl-Al 27161- 
05 Al from the National Institutes of Health. The government has certain rights in this 
5 invention. 



BACKGROUND OF THE INVENTION 



1 . Field of the Invention 

This invention relates to methods for plant transformation to enhance and control gene 
expression. More particularly, this invention relates to a method for expressing more than 
one transgenic gene in plants in equimolar amounts 
from a single promoter. 



2. Description of Related Art 

In recent years, development of plant transformation techniques and strategies for 
enhancing and controlling gene expression have broadened the practical applications of 
plant biotechnology. However, the potential of all these techniques must deal with the 
problems encountered when more than one transgene is expressed inplanta. 

Current approaches to expressing more than one gene in transgenic plants require the use 
of multiple promoters, which in itself presents problems related to levels of expression 
from each promoter. For example, the relative levels of expression in potato plants of two 
genes encoding two viral coat proteins (CP), which were introduced via a single Ti- 
derived transformation vector, were different in different plant lines (C. Lawson, et al., 
Bio/T echnology, 5:127-134, 1990). In an alternative approach, plants are retransformed 
with a second gene, but this technique may induce gene silencing effects (M. Matzke, et 
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al, EMBOJ., 8:643-649, 1989; T. Fujiwara, et al, Plant Cell Rep., 12:133-138, 1993). 
In addition, sexual crossing of different transgenic lines may enhance or inhibit gene 
expression depending on gene copy number and the nature of the gene insertion (S. 
Hobbs, etal. Plant Mol Biol, 21:17-26, 1993). Therefore, relative levels of expression 
5 of two transgenes in a plant cannot be predicted with the use of any of these different 
approaches, and rather are a consequence of experimental variability. 

Therefore, an alterative mechanism to express multiple genes in a single transgenic line, 
for instance in techniques designed to improve pathogen-derived protection against plant 
viruses is desirable. Systems which allow equimolar accumulation of two or more 
1 0 proteins under the control of a single transcriptional promoter, would avoid the problems 
outlined above, while providing the additional advantages of producing equal amounts 
of the two transgenes in each plant. 

Several plant and animal viruses encode proteinases that cleave viral polypeptides 
yielding mature proteins. For instance, plant potyviral genomes are expressed through 

15 the translation of a single polypeptide which is processed to release multiple individual 
viral proteins (J. Riechmann, et al, J. Gen. Virol, 71:1-16, 1992). Three viral proteinase 
activities have been implicated in this processing (J. Carrington, et al, EMBOJ., 2:1347- 
1353, 1990; J. Verchot, et al, Virology, 185:527-535, 1991). One of these, associated 
with the nuclear inclusion (NIa) protein, has been widely studied in the case of tobacco 

20 etch potyvirus (TEV) (J. Carrington, et al, J. Virol, £2:23 1 3-2320, 1 988; J. Carrington, 
et al, J. Virol, 6J.:2540-2548, 1987), and is responsible for several processing events 
involving the large viral polypeptide. Nla from TEV functions during post-translational 
processing through the recognition and cleavage of a specific heptapeptide (J. Carrington, 
et al, Proc. Nat. Acad. Sci. USA, 85:3391-3395, 1988; W. Dougherty, et al, EMBOJ., 

25 2: 1 28 1 - 1 287, 1 988). Taking advantage of this well-characterized proteinase activity, an 
expression cassette based on the TEV-NIa protein has been developed. This cassette 
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vector allows the synthesis of two or more proteins in equimolar amounts as part of a 
polyprotein that is cleaved into individual mature proteins by the NIa proteolytic activity. 
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SUMMARY OF THE INVENTION 

A cassette expression vector based on the nuclear inclusion (NIa) protease from tobacco 
etch virus (TEV) allows the transcription and translation of a nucleotide sequence 
comprising the TEV NIa coding region flanked on each side by its heptapeptide cleavage 
5 sequences and insertion sites for in frame insertion of two different open reading frames 
coding for heterologous proteins. Upon translation, of the resulting polypeptide the 
protease releases the two heterologous proteins in equimolar amounts by autoproteolytic 
reaction. Therefore, the invention provides a method for obtaining equimolar amounts 
of different proteins expressed under the control of a common promoter. Alternatively, 
10 a plurality of insertions sites can be engineered into a cassette containing a single TEV 
NIa protease gene for production of a plurality of peptides. In vitro or in vivo, the 
expression cassette functions to express genes encoding two or more different 
heterogeneous peptides from a single polypeptide by post translational self-cleavage by 
the NIa protease. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 A is a schematic diagram of a TEV-NIa-based expression cassette vector pPROl . 
The open box represents the NIa open reading frame. The shaded areas enlarged above 
show (as both nucleotide and amino acid sequence) the heptapeptide recognition sequence 
for the NIa proteolytic activity at both N- and C-terrnini of NIa; the engineered Sma /and 
Stu / cloning sites (underlined) for the in frame introduction of different genes; and start 
ATG and stop TGA codons. The NIa processing site between Gin and Gly is indicated 
as an open arrowhead. The sequence of the TEV 5' non-translated region is also indicated 
by a black arrow upstream of the NIa coding sequence. Relevant unique restriction 
enzyme sites are indicated: Ba (BamHl), Bg (Bglll), Ec (EcoR I), Sa (Sal I), Sc (Sac I), 
Sm (Sma I), and St (Stu I). 

Figure IB is a detailed restriction map of pPROl displaying the nucleotide sequence and 
the amino acid sequence of the NIa protease (SEQUENCE I.D. NO. 6). 

Figure IC is a schematic diagram showing amino acid additions that result at N- and C- 
15 termini of proteins cloned at the Sma I or Stu I enzyme restriction insertion sites of 
expression vector pPROl upon translation and subsequent proteolytic processing. The 
amino acid represented by X depends upon the particular restriction site used for cloning 
and can be coincident with amino acids in the cloned proteins in some cases. 



10 
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Figure 2 shows an autoradiograph of an SDS-PAGE gel indicating the results of in vitro 
translation of RNA transcribed from the pPROl expression cassette. Translation 
reactions were programmed with 1 ug of brome mosaic virus (BMV) RNAs (lane B), 
with no RNA added (lane 0), and with RNA transcribed in vitro from pPROl (lane 1). 
5 The molecular mass (in kDa), positions of the major proteins translated from BMV 
RNAs, and the position of the 49 kDa TEV NIa protein are indicated. 

Figure 3A shows a schematic representation of six different polypeptides translated 
transcribed in vitro from different pPROl -derived constructs containing the TMV CP 
sequence. Open boxes represent the TEV-NIa sequence. Striped boxes represent the 
1 0 TMV CP sequence contained in the insertion site. The names of the constructs and the 
expected molecular mass of the translated and processed products are indicated. Q/G 
indicates the amino acid residues at the cleavage sequence in constructs cloned in pPROl ; 
whereas H/G indicates the His to Gin mutation at -1 position that inhibits processing by 
NIa in constructs cloned in pPR04. 

1 5 Figure 3B shows an autoradiograph of an SDS-PAGE gel containing in vitro translation 
products obtained from the constructs shown in Figure 3 A. 

The vertical axis and lane assignments are the same as described for Figure 3C below. 
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Figure 3C shows fluorographs of immunoprecipitation analyses using anti-TMV CP 
antibody with aliquots from the translation samples shown in Figure 3B. In Figures 3B 
and 3C, translation reactions were programmed with no RNA added (lane 0); with RNA 
transcribed in vitro from pPROl (lane 1); pPROl.NT (lane 2); pPROl.TN (lane 3); 
5 pPROl.TAN (lane 4); pPR04.NT (lane 5); and pPR04.TN (lane 6). The molecular mass 
(in kDa) and positions of ,4 C-labeled protein markers are indicated. T= TMV coat protein; 
N = NIa protease 

Figure 4 shows the results of in vitro translation of RNAs transcribed from pPROl 
constructs containing TMV CP and SMV CP coding sequences inserted at two sites in the 
1 0 cassette. 

Figure 4A is a schematic diagram representing the vectors pPROl.SNT and pPROl .TNS. 
The open box represents the TEV-NIa sequence. Striped and dotted boxes represent 
TMV CP and SMV CP sequences, respectively that have been inserted into the cassette 
insertion sites. S = SMV coat protein. 

15 Figure 4B shows an autoradiograph of an SDS-PAGE gel with in vitro translation 
products obtained from pPRO 1 .SNT and pPRO 1 .TNS vectors. Translation reactions were 
programmed with no RNA added (lane 0); with RNA transcribed in vitro from pPROl 
(lane 1); pPROl.SNT (lane 2); and pPROl.TNS (lane 3). The molecular mass (in kDa), 
positions of the major proteins translated from BMV RNAs, and the positions of the TEV 

20 NIa, SMV CP and TMV CP are indicated. 

Figure 5 shows the results of in vitro translation of RNAs transcribed from a pPROl 
vector containing SMV CP and uid A (p-glucuronidase, GUS) coding sequences in the 
two insertion sites. 
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Figure 5 A shows a schematic diagram representing the vector pPROl.SNG. The open 
box represents TEV-NIa sequences. Dotted and striped boxed represent SMV CP and 
uidA (p -glucuronidase) sequences, respectively. 
G = uidA, GUS enzyme. 

5 Figure 5B shows an autoradiograph of an SDS-PAGE gel with in vitro translation 
products obtained from cassette vector pPROl.SNG. Positions of TEV NIa, GUS, and 
SMV CP proteins are indicated. Translation reactions were programmed with no RNA 
added (lane 0); and with RNA transcribed in vitro either from pPROl (lane 1); or 
pPROl.SNG (lane 2). Molecular mass (indicated in kDa), and positions of proteins 
1 0 translated from BMV RNAs is indicated: TEV NIa, GUS, and SMV CP proteins are also 
indicated. A black arrowhead indicates the position of a 1 10 kDa polypeptide present in 
small amounts. 

Figure 5C shows a photograph of an SDS PAGE gel used in a time course in vitro 
translation reaction with vector pPROl.SNG. Samples were withdrawn at times (in 
1 5 minutes) indicated at the top of each lane. At an incubation time of 15 minutes on SDS- 
PAGE, no 149 kDa precursor polypeptide could be detected. 
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DET AILED DESCRIPTION OF THE INVENTION 



10 



15 



20 



ic 
is 



In TEV, the NIa protease is synthesized as part of the polyprotein that results from the 
translation of the TEV genome. The genomic sequence of TEV, first disclosed by R. 
Allison, et al. (Virology, 154:9-20, 1986) is publicly available from EMBL and Genebank 
database under accession number M15239. NIa recognizes and cleaves specif! 
sequences of seven amino acids (heptapeptide) contained in the polyprotein and i 
responsible for partial processing of the viral polyprotein. Heptapeptide cleavage 
sequences recognized by the NIa from TEV (immediately 5-prime and 3-prime) have 
been shown to be Glu-X-X-Tyr-X-Gln-Gly (SEQUENCE I.D. NO. 1) or Glu-X-X-Tyr-X- 
Gln-Ser (SEQUENCE I.D. NO. 2) wherein X can be any amino acid (J. Carrington, „ */., 
1 988, supra and W. Dougherty, et al, supra). Cleavage location by TEV-NIa protease 
is after the Glu amino acid. In one embodiment of the present invention, the self- 
recognized cleavage sequence at the N-terniinini of the NIa protease is Glu-Pro-Val-Tyr- 
Phe-Gln-Gly (SEQUENCE I.D. NO. 3) and the self-recognized cleavage sequence at the 
C-termini is Glu-Leu-Val-Tyr-Ser-Gln-Gly (SEQUENCE I.D. NO. 4). These two 
heptapeptides are the ones that bracket the NIa protein in the TEV polyprotein. 

NIa releases itself from the polyprotein in an autoproteolytic reaction attacking at the 
cleavage sequences (J. Carrington, et al, Virology, IiQ:355-362, 1987), and is active both 
in cis, processing polypeptides in which it is included, and in trans, simultaneously 
cleaving different polypeptides. The cis protease activity of NIa has been assayed with 
different TEV polyproteins produced in vitro which contained NIa and either naturally 
occurring or mutated versions of the cleavage sequence (J. Carrington, et al, J. Virology, 
1988, 1987, supra). Protease activity in trans has been observed in many studies using 
as substrates TEV polyproteins that were labeled in vitro and incubated with NIa 
25 extracted from infected plants. 
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The TEV-NIa based expression cassette provided herein has been constructed to exploit 
the protease activity of NIa in a self-processing polypeptide in order to express two or 
more different proteins in equimolar amounts. For instance, cassette vector, named 
pPROl, shown in Figure 1, was obtained by PCR amplification using as template a full 
5 length TEV cloned cDNA. It comprises PROl (SEQUENCE ID NO. 5), which includes 
an open reading frame encompassing the NIa sequence (TEV nucleotides 5673 to 6983 
as numbered in R. Allison, et al, Virology, 154:9-20, 1986) as well as the target 
heptapeptides located at its N-terminus (SEQUENCE ID NO. 3) and C-terminus 
(SEQUENCE ID NO. 4). The TEV-NIa based cassette described herein also provides 
10 at least two blunt end restriction sites, preferably unique, that allow the in frame insertion 
of heterologous protein sequences vector for expression as part of a self-processing 
polypeptide. As used herein the term "heterologous" shall have the meaning that the gene 
inserted into the cassette insertion site is not native to TEV. 

For instance, in pPROl one insertion site is provided by a Sma 1 restriction enzyme site 
15 at the N-terminus of the TEV NIa sequence, and the other insertion site is provided by a 
Stu I restriction enzyme site at the C-terminus. In addition, the cassette optionally 
provides a start codon, preferably ATG, and a stop codon, preferably TGA, engineered 
upstream of the 5-prime site and downstream of the 3-prime site, respectively. For 
instance, in vector pPROl, which provides two insertion sites, an ATG start codon is 
20 upstream of the Sma I site, and a TGA stop codon is downstream of the Stu I site. In 
addition, the TEV-NIa based vectors herein preferably include upstream of the open 
reading frame the 144 nucleotide 5' non-translated region from TEV RNA, which has 
been shown to enhance translation in vitro and in vivo (J. Carrington and D. Freed, J. 
Virol, 64:1590-1597, 1990). 

25 One skilled in the art will appreciate that the techniques described herein could be used 
to insert more than two unique restriction endonuclease sites and heptapeptide recognition 
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sequences into the expression cassette, so as to express more than two heterologous 
proteins. Thus, the number of foreign proteins translated as part of a Ma-containing 
polyprotein is not, theoretically, limited to two, and embodiments of the cassette vector 
are contemplated within the scope of this invention wherein more than two insertion sites 
are useful for simultaneous expression of more than two proteins in equimolar amounts. 
In the embodiment of the invention utilizing more than one restriction site on one or both 
sides of the gene encoding the NIa protease and its flanking self-recognition sequences, 
it will be necessary to provide additional NIa protease self-recognition sequences 
between adjacent recognition sequences to allow for post translation^ self-cleavage by 
the NIa protease. A single protease is sufficient to cleave multiple sites within the single 
polypeptide produced from expression of the cassette. 



PROl (Figure IB; SEQUENCE ID NO. 6) was sequenced using techniques known in the 
art, and six mutations from the native sequence previously published for TEV were found. 
These changes were, according to numbering in Allison, supra, GC to CG at nucleotide 
1 5 5768-5769, A to G at nucleotide 5773, A to G at nucleotide 6235, T to C at nucleotide 
63 1 4, and A to G at nucleotide 696 1 . The mutations were left unmodified as they did not 
affect the protease activity of NIa as shown by the results presented herein. 

The cassette expression vectors presented herein, which exploit the proteolytic processing 
strategy of the TEV NIa protease, possess the advantages particular to the TEV NIa 
protease. First, NIa is a highly specific proteinase whose cleavage sequence has been well 
characterized (Carrington, e/ al., 1988; Dougherty, et al., \9%%,supra; W. Dougherty, et 
al, Virology, lZl:356-364, 1989; Dougherty, et al, Virology, 112:145-155, 1989). 
Second, NIa retains activity in vitro when cleavage sequences are inserted into several 
locations in TEV polyproteins (Carrington, et al, 1 988, supra; Dougherty, et al, 1988, 
supra) or into non-viral proteins (Parks, et al, J. Gen. virol, 21:775-783, 1992). Finally, 



25 
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Nla cleaves its substrate heptapeptide properly in vivo when expressed as a transgene in 
plants (Restrepo-Hartwig, et al, J. Virology, 66:5662-5666, 1992). 

In one embodiment of the TEV-NIa-based expression cassette vectors provided herein, 
the NIa protease functions in vitro to cleave polypeptides containing inserted coding 

5 sequences for many different polypeptides ranging in size from 1 to as many as about 800 
amino acids. In most of the constructs tested, cleavage was so effective that non- 
processed precursors could not be detected. In only two cases (an illustration is shown 
with pPR01,SNG in Example 4) were minimal amounts of non-cleaved precursors 
detected, indicating a lack of complete processing. These in vitro results suggest utility 

10 of this approach for in vivo applications as well wherein the vectors are introduced into 
suitable plants by electroporation into plant protoplasts using methods well known in the 
art. (See for instance, Current Protocols in Molecular Biology, Ed. by F.M. Ausubel, 
Current Protocols, Vol. 1, §9.3.2-3, 1993). Transformed protoplasts can be harvested and 
grown into full transgenic plants (C. A. Rhodes, et al., Science 24Q:204-207, 1988). 

1 5 In alternative embodiments, NIa-based expression cassette vectors are used in systems 
other than those involving plant cells. In general, the expression cassette of this invention 
can be used in any system in which the NIa protease has activity, for example, insect 
bacteria, mammalian, and other eukaryotic cells if operatively linked to suitable 
expression control elements such as a promoter, and a polyadenylation sequence, so as 

20 to bring about replication of the attached segment in a vector suitable for the type of cell 
line selected. However, for prokaryotic cells it may be necessary to reengineer the vector 
to bias it for codon specific organisms (see C.J Noren, et al, Science, 244:182, 1989). 
For example, as is well known, Bacillus spp. generally prefer more A/T rich nucleotide 
sequences. 
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The choice of vector to which a cassette of this invention is operatively linked depends 
directly, as is well known in the art, on the host cell to be transformed and the functional 
properties desired, e.g., vector replication and protein expression, these being limitations 
inherent in the art of constructing recombinant molecules. The vector itself may be of any 
suitable type, such as a viral vector (RNA or DNA), naked straight-chain or circular 
DNA, or a vesicle or envelope containing the nucleic acid material to be inserted into the 
cell. Techniques for construction of lipid vesicles, such as liposomes, are well known. 
Such liposome , ,ay be targeted to particular cells using other conventional techniques, 
such as providing an antibody or other specific binding molecule on the exterior of the 
liposome (see, e.g., A. Huang, et al., J. Biol. Chem., 255:8015-8018, 1980). In one 
embodiment of the invention, transient expression is contemplated wherein expression 
of the polypeptide is driven either by conventional transcriptional promoters or by plant 
viral vectors. In another embodiment, the TEV-NIa based cassette vector is used in 
prokaryotic systems since NIa proteases from different potyvirus have been shown to be 
active when expressed in bacterial cells (Garcia, et al., Virology, 170:362-369, 1989; 
Vance, a/.. Virology, ±$1: 19-30, 1992). The TEV NIa based expression vector can be 
advantageously used, therefore, whenever it is desirable to achieve equimolar production 
of two peptides in bacterial expression systems by inse: :.ng the NIa cassette into a 
bacterial expression vector, such as members of the pUC vector family. Other insect and 
animal cells known in the art to be useful in expression of recombinant proteins can also 
be used. For instance, the cassette vectors can be used in production of recombinant 
antibodies wherein it is desirable to achieve equimolar amounts of the heavy and light 
chains. In another embodiment, the cassette vectors provided herein are used to produce 
molecules that spontaneously assemble a two subunit complex, such as an enzyme. In yet 
25 another embodiment, a vector having more than two insertion sites is used to express 
multimers of any type. 



15 



20 
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Proteins expressed in the cassette vectors of this invention contain additional or 
extraneous amino acid residues at both N- and C-termini as a consequence of the NIa 
target heptapeptide and the cloning strategy used. The schematic diagram of Figure 1C 
illustrates the amino acid additions at N- and C-termini that result when in the proteins 
5 (open boxes) are cloned at either Sma I (Sm) or Stu I (ST) insertion sites of pPRO 1 . The 
amino acid represented by 'X' will depend on the restriction site used for cloning. In some 
cases one or more of the extraneous amino acids can be incorporated into the protein 
because it is already native to its sequence and would not have to "be engineered in. 

Due to the inclusion of additional amino acids at both termini of the cloned peptides, the 
1 0 biological activity of some proteins expressed in this system may be affected. However, 
one skilled in the art will know how to purify the produced proteins and treat them to clip 
off the extraneous residues. For instance, as shown in Figure 1C, the heterogenous 
proteins after cleavage by the protease can have among the extraneous terminal amino 
acids an undefined amino acid (represented by 'X') immediately next thereto at either end. 
15 If 'X' is selected to be a methionine and the produced peptide contains no other 
methionines, the peptide can readily be treated with cyanogen bromide to remove the 
extraneous residues. For example, the coat protein of TMV, which contains no 
methionines, can be expressed in one or both of the insertion sites, purified, and then can 
be treated with cyanogen bromide to provide the coat protein sequence free of extraneous 
20 terminal residues. One skilled in the art will be able to similarly utilize enzymes that 
cleave peptides between two particular residues to clip off the terminal extraneous 
residues from product heterogeneous peptides. 

Several practical applications of the NIa cassette expression vectors utilizing its 
expression in plants as a transgene are also contemplated herein. For instance, coat 
25 protein mediated resistance (CPMR) to viral infections can generally be obtained only 
against viruses of the same taxonomic group as the one whose coat protein was used as 
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the vaccine (Fitchen & Beachy, Annu. Rev. Microbiol. , 42:739-763, 1993). To engineer 
coat protein mediated resistance (CPMR) against viruses that belong to different 
taxonomic groups, sequences encoding two or more viral coat proteins from different 
taxonomic groups can be inserted into insertion sites of a NIa-based vector having two 
5 or more insertion sites. Alternatively, an insect resistance gene can be combined with a 
virus resistance gene. In an alternative embodiment, the vector of this invention can be 
used to express a selectable marker plus any other gene encoding a protein of the size 
contemplated herein. 

In yet another embodiment of this invention, described in full detail in U. S. Patent 
1 0 Application Serial No. 08/1 92,477 cofiled herewith, and incorporated herein by reference, 
the vector into which the cassette is ligated is a modification of the "infectious cDNA 
clone" of the tobacco mosaic virus to which is operably linked the promoter of the T7 
polymerase. Highly infectious RNA transcripts of a full-length cDNA of the 
Ul(common) strain of TMV have been produced in vitro using bacteriophage T7 RNA 

15 polymerase (Dawson, et al., Proc. Natl. Acad. Sci USA, 81:1832-1836, 1986; Meshi, et 
al, Proc. Natl. Acad. Sci. USA, 83:5043-5047, 1986). Alternatively, when inoculated into 
tobacco plants and other suitable host plants, this transcript causes systemic viral 
infection. Therefore, the vector of this invention can also be used to simultaneously 
provide systemic resistance to insect and virus in plants when inserted into the infectious 

20 cDNA clone of TMV. 

In this embodiment of the invention, to accommodate the cassette to be inserted therein, 
the cDNA encoding the TMV movement protein is deleted from the TMV infectious 
clone, and the NIa-based cassette is ligated in its place, thereby creating a modified viral 
vector. Nucleotide sequences encoding heterologous peptides ligated into the insertion 
25 sites of the NIa-based cassette contained within the modified infectious clone can be 
inoculated into host plants for expression therein. Therefore, in this embodiment of the 
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invention the coat proteins of plant viruses belonging to a different taxonomic group than 
TMV, or other genes capable of protecting a plant against insect or disease, can be ligated 
into the insertion sites of the NIa-based cassette in the infectious clone vector for 
production in the host plant. Since the modified infectious clone vector retains the native 
5 gene encoding the coat protein of TMV, a cassette with two insertion sites can be used to 
express multiple CP sequences confer CPMR against viruses from three different 
taxonomic groups. If recombinant plants transformed with a gene encoding the wild type 
movement protein of the TMV, such as plant line 277 (Deom, et al, Cell, 62:221-224, 
1992) are inoculated with the modified infectious clone vector, the viral infection will 
10 spread systemically. This modified infectious clone vector takes advantage of the 
extremely high level of expression characteristic of the viral system, and can be used to 
economically produce large amounts of polypeptides, virions suitable for use as vaccines, 
etc. One skilled in the art will appreciate that such product polypeptides and/or virions 
can be purified from plant leaves using standard methods (Bruening, et al. Virology, 
15 21:498-517,1976). 

In initial experiments, constructs containing NIa and the CP of TMV (Figure 3 A) were 
introduced in Nicotiana tabacum via Agrobacterium tumefaciens transformation. 
Preliminary data indicate that TMV CP expressed in vivo as part of pPROl confers 
CPMR (data not shown). Additional constructs with an insert that encodes a viral coat 

20 protein and a gene encoding p -glucuronidase will enable use of GUS activity as a probe 
for the levels of expression of the CP. Since the activity of the CP is destroyed if the 
protease does not cleave in the exact place anticipated, this experiment showed the 
specificty of the NIa protease for cleaving multiple exogenous peptides. This approach 
will be useful for studying those examples in which there is poor correlation between the 

25 levels of CP accumulation and the degree of plant viral resistance, providing additional 
important data on the molecular mechanism(s) of CPMR in these cases. 
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The following examples illustrate the manner in which the invention can be practiced. 
It is understood, however, that the examples are for the purpose of illustration and the 
invention is not to be regarded as limited to any of the specific materials or conditions 
therein. 

5 EXAMPLE T 

CONSTRUCTION OF pPRQI VECTORS 

Recombinant DNA manipulation and E. coli transformation were carried out according 
to existing protocols (Sambrook, et al, Molecular Cloning: A Laboratory Manual, Cold 
Spring Harbor Laboratory, Cold Spring Harbor, New York, 1989). The DNA inserts used 
10 for the assembly of the different constructs were obtained by the polymerase chain 
reaction (PCR) using equipment and techniques provided by Perkin Elmer Cetus 
(Emoryville, CA). The sequences of primers used for amplification are detailed in Table 
1 , the prefix indicating the gene to which they are targeted. 

The expression cassette vector pPROl (Figures 1 A and IB) was assembled in pBluescript 
15 II KS (+) (Stratagene, San Diego, C A) under the transcriptional control of a T7 promoter 
by directional insertion of PROl (SEQUENCE ID NO. 5) at the Sac I - EcoR I sites of 
the multiple cloning site, rendering pPROl . NIa and 5 '-non-translated (5-NTR) sequences 
from TEV were obtained by PCR using as DNA template a full length TEV cDNA clone 
(kindly provided by Dr. J. Carrington, Texas A&M University). Oligonucleotide primers 
20 for amplification of NIa were TEVNIA.N and TEVNIA.C (SEQUENCE ID NOS . 7 and 
8, respectively). These two primers amplified the NIa open reading frame (Figure IB) 
plus the sequences encoding the two specific heptapeptide cleavage sequences located at 
each end of NIa in the TEV genome and contained, in addition, either ATw I and Sma I 
(TEVNIA.N) or Stu I and EcoR I (TEVNIA.C) restriction enzyme sites. The PCR 
25 product was directionally inserted pBluescript using Xba I and EcoR I to yield vector 
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pBCNIa. Oligonucleotide primers used for PCR amplification of the 5'-NTR of TEV 
were TEVNTR.5 and TEVNTR.3 (SEQUENCE ID NOS. 9 and 10, respectively). These 
primers contained either Sac I and Bgl II (TEVNTR.5) or Sma I (TEVNTR.3) restriction 
enzyme cleavage sites. The final step in the assembly of pPROl was a Sac 1-Sma I 

5 directed insertion of the TEV-5 NTR resulting from the PCR reaction into vector 
pBCNIa. Mutagenesis at the heptapeptides in the TEV sequence encoding the protease 
cleavage recognition sites was accomplished with primers TEVNIA.N2 and TEVNIA.C3 
(SEQUENCE ID NOS. 1 1 and 12, respectively) which contained either one or two 
nucleotide changes (when compared to TEVNIA.N and TEVNIA.C, respectively) that 

1 0 mutated the glutamine located at position - 1 (relative to the cleavage site) to histidine to 
introduce an Nco I insertion site useful for recovering the recombinant clones from the 
cloning vector pBCNIa. 

The cDNAs for different open reading frames (ORFs) encoding heterogenous peptides 
inserted into pPROl included those encoding tobacco mosaic virus (TMV) and soybean 

15 mosaic virus (SMV) coat proteins (CP), as well as the uidA gene encoding the p- 
glucuronidase (GUS) activity from E. coll These ORFs were obtained by PCR using as 
template publicly available nucleotide sequences. The nucleotide sequence of tobacco 
mosaic virus RNA, first published by P. Goelet, et al (Proc. Natl. Acad. Sci. U.S.A., 
7_9_:58 18-5822, 1982) is publicly available from EMBL and Genebank databases under 

20 Accession Numbers V01408 and J02415. The nucleotide sequence of the CP gene of 
soybean mosaic virus, first published by A. Eggenberger, et al, J. Gen. Virol., 20:1853- 
1860, 1989, is available from EMBL and Genebank databases under Accession Number 
D00507. The gene encoding GUS, first disclosed by R. A. Jefferson, et al, {Proc. Natl 
Acad. Sci. U.S.A., 81:8447-8451, 1986) and available from EMBL and Genebank 

25 databases under Accession Number M14641 , was obtained from Clontech. For PCR to 
obtain the ORF of TMV CP, primers TMV CP 51 (SEQUENCE ID NO. 13 was used at 
the 5' end and TMV CP 3 1 (SEQUENCE ID NO. 14) was used at the 3' end. For PCR 
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to obtain the ORE of SMV CP, primer SMV CP Nl (SEQUENCE ID NO. 15) was used 
at the 56' end and primer SMV CP C2 (SEQUENCE ID NO. 1 6) was used at the 3' end. 
For PCR to obtain the ORE of GUS, primer GUS N2 (SEQUENCE ID NO. 1 8) was used 
at the 5' end and primer GUS CI (SEQUENCE ID NO. 19) was used at the 3' end. 
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TABLE 1 

SE QUENCES OF THE OLIGONUCLEOTIDE PRIMERS US ED 

TEVNIA.N V-fiC TCTAGA CCCGGG G A A CCA GTCTATTTCCA AGGG -3 ' (SEQ. ID NO. 7) 

TEVNIA.C V-r.r. fiAATTC A AGGCCT rrfrrmCQAGTACACCAATTCA-y (SEQ. ID NO. 8) 

TEVNTR.5 S'-C.CC GAGCTC AGATCT A A ATA AC AAA TCTC A AC ACA AC A-3' (SEQ. ID NO. 9) 

TEVNTR.3 V-TCr CCCGGG CATfiGCTATC r/rTCGTAAATGG-3' (SEQ. ID NO. 10) 

TEVNIA.N2b S'-TGG CCCGGG fi A ACCAOTC TATTTCCATGGG-3' (SEQ. ID NO. 11) 

* 

TEVN1A.C3» S'-GC GAATTCA AfifiCCT CCC ATOGOAGTACACCAATTCA-3 (SEQ. ID NO. 12) 

* * 

TMVCP.5 1 S'-A AAGGCCT TCTTACAGTAT CACTACTCC-3' (SEQ. ED NO. 13) 

TMVCP.3 1 V-AOG CCCGGG AfrTTGCAGGA CCAGAGGTCC-3' (SEQ. ID NO. 14) 

SMVCP.N 1 S'-A AAGGCCT TCAGGCAAGO AOAAGG-3' (SEQ. ID NO. 15) 

SMVCP.C2 V-Afifi CCCGGG CTfiCGOTGG GCCCATGC-3' (SEQ. ID NO. 16) 

15 GUS.N2 a a AfifiPPT CTAOAAACCCCAACCCG-3 1 (SEQ. ID NO. 17) 

GUS.C1 V-Cfi GAATTC TCATTGTTTGCCTCCCTGCTG-3' (SEQ. ID NO. 18) 



10 



Nucleotides annealing to the target genes are underlined with a single line, whereas 
nucleotides corresponding to the restriction enzyme recognition sequences are doubly 
underlined. 

20 b Nucleotides changed in TEVNIA.N2 and TEVNIA.C3, when compared with 
TEVNIA.N. and TEVNIA.C, respectively, are marked by an asterisk underneath. 
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PCR products corresponding to SMV- and TMV-CP genes were digested with Stu I and 
Sma I and inserted either at the Sma I or the Stu I sites of pPROl (Figure 1), depending 
on the construct. The PCR product corresponding to the uidA ORF was digested with Stu 
I and EcoR I and inserted at the C terminus of NIa in pPROl . 

5 EXAMPLE 2 

IN VITRO TRANSCRIPTION AND TRANSLATION 

One ug of plasmid pPROl DNA containing the inserted heterologous ORFs purified from 
R coli through QIAprep mini columns (Qiagen, Chatsworth, C A) was first linearized with 
Sal I (which cleaves downstream of pPROl), and subsequently transcribed in vitro with 

10 T7 RNA polymerase (Epicentre Technologies, Madison, WI). Size and integrity of 
transcribed mRNA were confirmed by agarose gel electrophoresis. Approximately one 
Ug of mRNA was used to program in vitro translation in 25 uL volume reactions using 
a nuclease treated rabbit reticulocyte lysate system (Promega, Madison, WI) according 
to the manufacturer's protocol. Proteins were synthesized in a nuclease treated rabbit 

1 5 reticulocyte lysate in the presence of 35 S-Met and then analyzed by SDS-PAGE ( 12.5% 
polyacrylamide) and autoradiography. However, since TMV CP contains no methionine 
residues, 3 H-Leu was used when the TMV CP ORF was translated in vitro. Proteins 
translated in vitro were analyzed by autoradiography following SDS-PAGE according to 
the method of U. Laemmli (Nature, [London] 227:680-685, 1 970). 



As shown in Figure 2, upon in vitro transcription and subsequent in vitro translation in 
the presence of 35 S-Met, pPROl gav.- the expected translated peptide of approximately 
49 kDa. Experimental results demonstrate that this protein corresponded to NIa since it 
exhibited the proper proteolytic activity when expressed in pPROl as part of a 
polyprotein. 
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Other minor bands were also detected, some of which could be due to the autoproteolysis 
that releases the VPg (the protein linked to the 5' end of the viral RNA) from the protease 
domain in NIa during post-translational processing of TEV as described in W. Dougherty, 
et al. {Virology, 181:449-456, 1 991 ). 

5 Construction of Vectors Expressing TMV CP 

To confirm that pPROl encodes NIa protease activity, several constructs were engineered 
in which the CP ORF from tobacco mosaic tobamovirus (TMV) was inserted into the 
cassette vector provided herein. These constructs are shown schematically in Figure 3 A. 
The first two constructs, pPROl.NT and pPROl.TN, contained the TMV CP sequence 

10 in the C-terminal or N-terminal cloning sites, respectively. To demonstrate that 
processing of the resultant polyprotein was due to recognition and cleavage of the specific 
heptapeptides by the NIa protease and not to non-specific degradation, two additional 
controls were designed. First, the C-terminal NIa protease domain was removed with a 
frameshift mutation at the unique BamHI site, resulting in pPROITaN (Figure 3 A). In 

15 this construct, processing is not expected despite the presence of the naturally occurring 
cleavage sequence. Second, using methods described in Example 1, the two target 
heptapeptides were mutated to include a Gin to His change at the -1 position. This 
mutation at the cleavage site has been previously shown to inhibit the specific processing 
by NIa in TEV (Dougherty, et al., 1988, supra; Dougherty, et al, 1989, supra). The 

20 resulting mutant cassette vector was named pPR04 and the corresponding pPR04.NT and 
pPR04.TN were also constructed as shown in Figure 3A. 

In vitro transcription and translation of TMV CP-containing constructs in the above 
described rabbit reticulocyte lysate in the presence of 3 H-Leu, upon analysis by SDS- 
PAGE (15% polyacrylamide) and fluorography, revealed the expected patterns and sizes 
25 of labeled proteins as shown in Figure 3B. In addition to the 49 kDa protein, a band 
corresponding to a protein of approximately 18 kDa was detected in pPROl.NT and 
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pPROl.TN. 18 kDa is the expected size of TMV CP when expressed in pPROl 
constructs. The CP produced from pPROl.TN was slightly larger than that produced 
from pPROLNT, in accordance with the numbers of amino acid residues added when the 
cDNA was cloned at the Sma I site versus the Stu I site (see Figure 1C). On the other 
5 hand, the major proteins resulting from constructs pPR04.NT and pPR04.TN migrated 
at positions corresponding to the size of the precursor polypeptide containing NIa plus 
TMV CP (68 kDa). Finally, when the protease domain from NIa was absent 
(pPROl.TaN) a single protein of about 28 kDa, corresponding to the truncated protein, 

was riptf*r1v»H 



10 



15 



20 



25 



was detected. 

Results of the in vitro translation followed by immunoprecipitation analyses of these 
vectors are shown in Figure 3C respectively. Immunoprecipitation assays were based 
upon previously described protocols with minor modifications. Briefly, 20 uL aliquots 
of in vitro translation reactions were diluted to 100 uL with TBSN (25 mM Tris-HCl pH 
7.5, 150mMNaCl, 1% Nonidet P-40) and pre-incubated with protein A Sepharose beads 
(Sigma, St. Louis, MO) for 15 minutes on ice. After removing the beads, one uL was 
added of an appropriate dilution of a polyclonal antibody raised against TMV CP 
(ATCC# PVAS - 135) by standard techniques well known in the art . The mixture was 
incubated for 2-4 hours at 4°C with slow shaking. Subsequently, protein A Sepharose 
beads previously blocked with rabbit reticulocyte lysate were added and the mixture was 
kept on ice for 15 minutes with occasional shaking. The Sepharose beads were recovered 
and washed twice with 0.5 M LiCl, 20 mM Tris-HCl pH 8, once with TBSN, and once 
with H 2 0. Finally, beads containing immunoprecipitated labeled proteins were 
resuspended in SDS-PAGE loading buffer and the proteins were analyzed as described 
above. 

Immunoprecipitation reactions of the proteins produced in vitro using an anti-TMV CP 
antibody resulted in precipitation of the expected proteins (Figure 3C). Only those 
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peptides which included TMV CP sequences were selectively immunoprecipitated, 
whereas the 49 kDa NIa protein was not. These data clearly demonstrate that pPROl 
functions as predicted. 

Several experiments were carried out to determine whether or not proteolytic processing 
5 could occur in trans. The labeled peptide that was translated from P PR01.TaN was not 
processed when non-labeled 49 kDa protein translated from pPROl was used as source 
of NIa proteinase (data not shown). This result is in agreement with previously reported 
data. (J. Carrington and W. Dougherty, 1987, supra). 

FX AMPLE 3 

10 PR OTFOI - YTI C PROCESSING OF TWQ 

nTFFFRFNT PR QTF.TNS INTP oni Tr'F.n IN pPROl 

pPROl was further tested with the introduction of coding sequences for two different 
heterologous proteins into the two insertion sites. ORFs encoding coat proteins from 
viruses belonging to different groups, SMV (s; potyvirus) and TMV (T), were inserted 
15 to create constructions having the heterologous ORFs in the two possible positions. 

Figure 4A shows the resulting constructs pPROl.SNT and pPROl.TNS. As shown in 
Figure 4B, in vitro transcription and translation of these two constructs gave the predicted 
patterns of labeled proteins, resulting in the accumulation of proteins with the expected 
sizes of the NIa (49 kDa), SMV CP (around 30 kDa) and TMV CP (around 18 kDa). As 
20 expected, the coat proteins inserted at the Sma I site of pPROl gave slightly larger mature 
proteins than those inserted at the Stu I site due to incorporation of extra peptides as 
described in Figure 1C. Moreover, the more rapidly migrating proteins (predicted to be 
the TMV CP) co-migrated with proteins recovered following immunoprecipitation with 
anti-TMV CP antibody as in Example 2 above. 
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EXAMPLE 4 
PROTEOL YTIC PROCESSING OF TWO OPFN 
READING FRAMES FROM UNRELATED PROTEINS 

Another construct, pPROl.SNG shown in Figure 5 A, consisted of the SMV CP 
5 positioned at the Sma I insertion site of pPROl and the open reading frame encoding the 
p-glucuronidase activity (GUS) at the Stu I insertion site of pPROl . As shown in Figure 
5B, following in vitro translation in the presence of 35 S-Met, the expected profile of 
mature proteins was generated. The polypeptide synthesized upon translation of this 
construct has a predicted size of about 149 kDa, and is the largest that has been tested 
1 0 with the pPROl expression cassette. In this particular case, a high molecular weight band 
corresponding to a polypeptide of approximately 1 10 kDa was present in relatively low 
amounts. This protein probably corresponds to a fusion of the NIa and GUS peptides, 
implying that processing was not complete. 

A time course in vitro translation reaction programmed with construct pPROl.SNG and 
1 5 having samples withdrawn at the 5, 1 0, 1 5, 20, 30, 45, 60, and 90 minute intervals showed 
the predicted increase in the accumulation of the expected proteins with time as analyzed 
by SDS-PAGE (10% polyacrylamides) and autoradiography (Figure 5C). Even at short 
incubation times (15 min), no 149 kDa precursor could be detected, indicating efficient 
co-translational processing. However, pulse chase experiments with this construct did not 
20 demonstrate significant post translational processing of the low amounts of 1 10 kDa 
polypeptide (data not shown). 

The foregoing description of the invention is exemplary for purposes of illustration and 
explanation. It should be understood that various modifications can be made without 
departing from the spirit and scope of the invention. Accordingly, the following claims 
25 are intended to be interpreted to embrace all such modifications. 
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gTTTVTM ARV OF SEQUENCES 

Sequence ID No. 1 is an amino acid sequence for the consensus heptapeptide cleavage 
sequences that are cleaved by the NIa from TEV. 

Sequence ID No. 2 is an amino acid sequence for the consensus heptapeptide cleavage 
5 sequences that are cleaved by the NIa from TEV. 

Sequence ID No. 3 is an amino acid sequence for a self-recognized heptapeptide cleavage 
sequences at the N terminus of NIa in TEV. 

Sequence ID No. 4 is an amino acid sequence for a self-recognized heptapeptide cleavage 
sequence C terminus of NIa in TEV. 

1 0 Sequence ID No. 5 is a nucleotide sequence for PROl (Figure IB). 

Sequence ID No 6 is an amino acid sequence for PROl (Figure IB). 

Sequence ID No. 7 is a nucleotide sequence for a primer (TEVNIA.N) for amplification 
and cloning of cDNA encoding the nuclear inclusion a protein of tobacco etch potyvirus. 

Sequence ID No 8 is a nucleotide sequence for a primer (TEVNIA.C) for amplification 
1 5 and cloning of cDNA encoding the nuclear inclusion a protein of tobacco etch potyvirus. 

Sequence ID No. 9 is a nucleotide sequence for a primer (TEVNTR.5) for amplification 
and cloning of the 5' untranslated region of tobacco etch potyvirus. 
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Sequence ID No 10 is a nucleotide sequence for a primer (TEVNTR.3) for amplification 
and cloning of the 5' untranslated region of tobacco etch potyvirus. 

Sequence ID No. 1 1 is a nucleotide sequence for a primer (TEVNIA.N2) for amplification 
and cloning of cDNA encoding the nuclear inclusion protein of tobacco etch potyvirus. 

5 Sequence ID No 12 is a nucleotide sequence for a primer (TEVNIA.C3) for amplification 
and cloning of cDNA encoding the nuclear inclusion protein of tobacco etch potyvirus. 

Sequence ID No. 13 is a nucleotide sequence for a primer (TMVCP.5 1 ) for amplification 
and cloning of cDNA encoding the tobacco mosaic virus coat protein. 

Sequence ID No 1 4 is a nucleotide sequence for a primer (TMVCP.3 1 ) for amplification 
1 0 and cloning of cDNA encoding the tobacco mosaic virus coat protein. 

Sequence ID No. 1 5 is a nucleotide sequence for a primer (SMVCP.N1) for amplification 
and cloning of cDNA encoding the soybean mosaic virus coat protein. 

Sequence ID No. 16 is a nucleotide sequence for a primer (SMVCP.C2) for amplification 
and cloning of cDNA encoding the soybean mosaic virus coat protein. 

1 5 Sequence ID No. 1 7 is a nucleotide sequence for a primer (GUS.N2) for 
amplification and cloning of cDNA encoding p -glucuronidase. 

Sequence ID No. 18 is a nucleotide sequence for a primer (GUS.C1) for amplification and 
cloning of cDNA encoding p -glucuronidase. 
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(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(ix) FEATURE: 

(A) NAME /KEY : Peptide 

(B) LOCATION: 1. .7 

<D) OTHER INFORMATION: /note- "where X appears, X 
any amino acid" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

Glu Xaa Xaa Tyr Xaa Gin Gly 
1 5 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 7 amino acids 
<B) TYPE: amino acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: peptide 



(ix) FEATURE: 

(A) "NAME /KEY : Peptide 

(B) LOCATION: 1. .7 

(D) OTHER INFORMATION: /note= "where X appears, X 
any amino acid" 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

• Glu Xaa Xaa Tyr Xaa Gin Ser 
1 5 

(2) INFORMATION FOR SEQ ID NO: 3: 

5 (i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

10 (ii) MOLECULE TYPE: peptide 



(ix) FEATURE: 

(A) NAME /KEY : Peptide 

(B) LOCATION: 1..7 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

15 Glu Pro Val Tyr Phe Gin Gly 

1 5 

(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 7 amino acids 
20 (B) TYPE: amino acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: peptide 



(ix) FEATURE: 
25 (A) NAME /KEY : Peptide 

(B) LOCATION: 1. .7 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

Glu Leu Val Tyr Ser Gin Gly 
1 5 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1488 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 



10 



(ii) MOLECULE TYPE: DNA (genomic) 



15 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: PROl 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 156. .1481 



20 



25 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GAGCTCAGAT CTAAATAACA AATCTCAACA CAACATATAC AAAACAAACG AATCTCAAGC 

AATCAAGCAT TCTACTTCTA TTGCAGCAAT TTAAATCATT TCTTTTAAAG CAAAAGCAAT 

TTTCTGAAAA TTTTCACCAT TTACGAACGA TAGCC ATG CCC GGG GAA CCA GTC 

Met Pro Gly Glu Pro Val 
1 5 

TAT TTC CAA GGG AAG AAG AAT CAG AAG CAC AAG CTT AAG ATG AGA GAG 
Tyr Phe Gin Gly Lys Lys Asn Gin Lys His Lys Leu Lys Met Arg Glu 
10 15 20 

GCG CGT GGG GCT AGA GGG CAA TAT GAG GTT GCA GCG GAC GCA GGG GCG 
Ala Arg Gly Ala Arg Gly Gin Tyr Glu Val Ala Ala Asp Ala Gly Ala 
25 30 35 



60 



120 



173 



221 



269 
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CTA GAA CAT TAC TTT GGA AGC GCA TAT AAT AAC AAA GGA AAG CGC AAG 
Leu Glu His Tyr Phe Gly Ser Ala Tyr Asn Asn Lys Gly Lys Arg Lys 
40 45 50 

GGC ACC ACG AGA GGA ATG GGT GCA AAG TCT CGG AAA TTC ATA AAC ATG 
5 Gly Thr Thr Arg Gly Met Gly Ala Lys Ser Arg Lys Phe He Asn Met 

55 60 65 70 

TAT GGG TTT GAT CCA ACT GAT TTT TCA TAC ATT AGG TTT GTG GAT CCA 
Tyr Gly Phe Asp Pro Thr Asp Phe Ser Tyr He Arg Phe Val Asp Pro 
75 80 85 

10 TTG ACA GGT CAC ACT ATT GAT GAG TCC ACA AAC GCA CCT ATT GAT TTA 

Leu Thr Gly His Thr He Asp Glu Ser Thr Asn Ala Pro He Asp Leu 
90 95 100 

GTG CAG CAT GAG TTT GGA AAG GTT AGA ACA CGC ATG TTA ATT GAC GAT 
Val Gin His Glu Phe Gly Lys Val Arg Thr Arg Met Leu He Asp Asp 
15 105 HO H5 

GAG ATA GAG CCT CAA AGT CTT AGC ACC CAC ACC ACA ATC CAT GCT TAT 
Glu He Glu Pro Gin Ser Leu Ser Thr His Thr Thr He His Ala Tyr 
120 125 130 

TTG GTG AAT AGT GGC ACG AAG AAA GTT CTT AAG GTT GAT TTA ACA CCA 
20 Leu Val Asn Ser Gly Thr Lys Lys Val Leu Lys Val Asp Leu Thr Pro 

135 140 145 150 

CAC TCG TCG CTA CGT GCG AGT GAG AAA TCA ACA GCA ATA ATG GGA TTT 
His Ser Ser Leu Arg Ala Ser Glu Lys Ser Thr Ala He Met Gly Phe 
155 ISO 165 

25 CCT GAA AGG GAG AAT GAA TTG CGT CAA ACC GGC ATG GCA GTG CCA GTG 

Pro Glu Arg Glu Asn Glu Leu Arg Gin Thr Gly Met Ala Val Pro Val 
170 175 180 

GCT TAT GAT CAA TTG CCA CCA AAG AGT GAG GAC TTG ACG TTT GAA GGA 
Ala Tyr Asp Gin Leu Pro Pro Lys Ser Glu Asp Leu Thr Phe Glu Gly 
30 185 190 195 



317 



365 



413 



461 



509 



557 



605 



653 



701 



749 
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10 



15 



20 



25 



GAA AGC TTG TTT AAG GGA CCA CGT GAT TAC AAC CCG ATA TCG AGC ACC 797 
Glu Ser Leu Phe Lys Gly Pro Arg Asp Tyr Asn Pro lie Ser Ser Thr 
200 205 210 



845 



893 



941 



ATT TGT CAC TTG ACG AAT GAA TCT GAT GGG CAC ACA ACA TCG TTG TAT 
lie Cys His Leu Thr Asn Glu Ser Asp Gly His Thr Thr Ser Leu Tyr 
215 220 225 2 30 

GGT ATT GGA TTT GGT CCC TTC ATC ATT ACA AAC AAG CAC TTG TTT AGA 
Gly lie Gly Phe Gly Pro Phe lie He Thr Asn Lys His Leu Phe Arg 
235 240 245 

AGA AAT AAT GGA ACA CTG TTG GTC CAA TCA CTA CAT GGT GTA TTC AAG 
Arg Asn Asn Gly Thr Leu Leu Val Gin Ser Leu His Gly Val Phe Lys 
250 255 260 

GTC AAG AAC ACC ACG ACT TTG CAA CAA CAC CTC ATT GAT GGG AGG GAC 98 9 

Val Lys Asn Thr Thr Thr Leu Gin Gin His Leu He Asp Gly Arg Asp 
265 270 275 

ATG ATA ATT ATT CGC ATG CCT AAG GAT TTC CCA CCA Tl .. CCT CAA AAG 103 7 

Met He He He Arg Met Pro Lys Asp Phe Pro Pro Phe Pro Gin Lys 
280 285 290 

CTG AAA TTT AGA GAG CCA CAA AGG GAA GAG CGC ATA TGT CTT GTG ACA 108 5 

Leu Lys Phe Arg Glu Pro Gin Arg Glu Glu Arg He Cys Leu Val Thr 
295 300 305 310 

ACC AAC TTC CAA ACT AAG AGC ATG TCT AGC ATG GTG TCA GAC ACT AGT 113 3 

Thr Asn Phe Gin Thr Lys Ser Met Ser Ser Met Val Ser Asp Thr Ser 
315 320 325 

TGC ACA TTC CCT TCA TCT GAT GGC ATA TTC TGG AAG CAT TGG ATT CAA 1181 
Cys Thr Phe Pro Ser Ser Asp Gly He Phe Trp Lys His Trp He Gin 
33 ° 335 340 



ACC AAG GAT GGG CAG TGT GGC AGT CCA TTA GTA TCA ACT AGA GAT GGG 122 9 

Thr Lys Asp Gly Gin Cys Gly Ser Pro Leu Val Ser Thr Arg Asp Gly 
30 345 350 355 
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TTC ATT GTT GGT ATA CAC TCA GCA TCG AAT TTC ACC AAC ACA AAC AAT 1277 
Phe lie Val Gly lie His Ser Ala Ser Asn Phe Thr Asn Thr Asn Asn 
360 365 370 

TAT TTC ACA AGC GTG CCG AAA AAC TTC ATG GAA TTG TTG ACA AAT CAG 1325 
5 Tyr Phe Thr Ser Val Pro Lys Asn Phe Met Glu Leu Leu Thr Asn Gin 

375 380 385 390 

GAG GCG CAG CAG TGG GTT AGT GGT TGG CGA TTA AAT GCT GAC TCA GTA 13 73 

Glu Ala Gin Gin Trp Val Ser Gly Trp Arg Leu Asn- Ala Asp Ser Val 
395 400 405 

10 TTG TGG GGG GGC CAT AAA GTT TTC ATG AGC AAA CCT GAA GAG CCT TTT 1421 

Leu Trp Gly Gly His Lys Val Phe Met Ser Lys Pro Glu Glu Pro Phe 
410 415 420 

CAG CCA GTT AAG GAA GCG ACT CAA CTC ATG AGT GAA TTG GTG TAC TCG 146 9 

Gin Pro Val Lys Glu Ala Thr Gin Leu Met Ser Glu Leu Val Tyr Ser 
15 425 430 435 

CAA GGG AGG CCT TGAATTC 1488 
Gin Gly Arg Pro 
440 



(2) INFORMATION FOR SEQ ID NO : 6 : 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 442 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

Met Pro Gly Glu Pro Val Tyr Phe Gin Gly Lys Lys Asn Gin Lys His 
X 5 10 15 

Lys Leu Lys Met Arg Glu Ala Arg Gly Ala Arg Gly Gin Tyr Glu Val 
20 25 30 
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10 



15 



20 



Ala Ala Asp Ala Gly Ala Leu Glu His Tyr Phe Gly Ser Ala Tyr Asn 
35 40 4S 

Asn Lys Gly Lys Arg Lys Gly Thr Thr Arg Gly Met Gly Ala Lys Ser 
50 55 60 

Arg Lys Phe He Asn Met Tyr Gly Phe Asp Pro Thr Asp Phe Ser Tyr 



65 70 75 



80 



He Arg Phe Val Asp Pro Leu Thr Gly His Thr lie Asp Glu Ser Thr 
85 90 95 

Asn Ala Pro He Asp Leu Val Gin His Glu Phe Gly Lys Val Arg Thr 
100 105 110 

Arg Met Leu He >.sp Asp Glu He Glu Pro Gin Ser Leu Ser Thr His 
115 120 125 

Thr Thr He His Ala Tyr Leu Val Asn Ser Gly Thr Lys Lys Val Leu 
130 135 140 

Lys val Asp Leu Thr Pro His Ser Ser Leu Arg Ala Ser Glu Lys Ser 
145 150 155 160 

Thr Ala He Met Gly Phe Pro Glu Arg Glu Asn Glu Leu Arg Gin Thr 



165 



170 



175 



Gly Met Ala Val Pro Val Ala Tyr Asp Gin Leu Pro Pro Lys Ser Glu 



180 



185 



190 



Asp Leu Thr Phe Glu Gly Glu Ser Leu Phe Lys Gly Pro Arg Asp Tyr 
195 200 205 

Asn Pro He Ser Ser Thr He Cys His Leu Thr Asn Glu Ser Asp Gly 
21° 215 220 

25 His Thr Thr Ser Leu Tyr Gly He Gly Phe Gly Pro Phe He He Thr 



225 230 235 



240 



Asn Lys His Leu Phe Arg Arg Asn Asn Gly Thr Leu Leu Val Gin Ser 
245 25 o 255 



BNSOOCID: <WO_9S21249A1 J_> 



^ PCTAJS95/01495 

WO 95/21249 * 

-36- 



Leu His Gly Val Phe Lys Val Lys Asn Thr Thr Thr Leu Gin Gin His 
260 265 270 

Leu lie Asp Gly Arg Asp Met He lie He Arg Met Pro Lys Asp Phe 
275 280 285 



Pro Pro 



Phe Pro Gin Lys Leu Lys Phe Arg Glu Pro Gin Arg Glu Glu 
290 295 300 

Arg He Cys Leu Val Thr Thr Asn Phe Gin Thr Lys Ser Met Ser Ser 



305 



310 315 320 



Met Val Ser Asp Thr Ser Cys Thr Phe Pro Ser Ser Asp Gly He Phe 
10 325 330 335 

Trp Lys His Trp He Gin Thr Lys Asp Gly Gin Cys Gly Ser Pro Leu 
340 345 350 

Val Ser Thr Arg Asp Gly Phe He Val Gly He His Ser Ala Ser Asn 



355 



360 365 



15 Phe Thr Asn Thr Asn Asn Tyr Phe Thr Ser Val Pro Lys Asn Phe Met 



370 



375 380 



Glu Leu Leu Thr Asn Gin Glu Ala Gin Gin Trp Val Ser Gly Trp Arg 



385 



390 395 400 



Leu Asn Ala Asp Ser Val Leu Trp Gly Gly His Lys Val Phe Met Ser 
20 405 410 415 

Lys Pro Glu Glu Pro Phe Gin Pro Val Lys Glu Ala Thr Gin Leu Met 
420 425 430 

Ser Glu Leu Val Tyr Ser Gin Gly Arg Pro 
435 440 
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(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 



(vii) IMMEDIATE SOURCE: 

( B ) CLONE : TEVNI A . N 

10 (ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..3 5 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 ■ 
GCTCTAGACC CGGGGAACCA GTCTATTTCC AAGGG 
15 (2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 



20 



35 



(ii) MOLECULE TYPE: DNA (genomic) 



(vii) IMMEDIATE SOURCE: 

(B) CLONE: TEVNIA.C 

(ix) FEATURE: 
25 (A) NAME /KEY : CDS 

(B) LOCATION: 1..37 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
GCGAATTCAA GGCCTCCCTT GCGAGTACAC CAATTCA 
(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



37 



10 (vii) IMMEDIATE SOURCE: 

(B ) CLONE : TEVNTR . 5 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1. .38 



15 ( X i) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

GCCGAGCTCA GATCTAAATA ACAAATCTCA ACACAACA 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 31 base pairs 
20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



38 



25 



(vii) IMMEDIATE SOURCE: 

(B) CLONE: TEVNTR . 3 
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(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1. .31 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
TCCCCCGGGC ATGGCTATCG TTCGTAAATG G 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(vii) IMMEDIATE SOURCE: 

(B) CLONE: TEVNIA. N2b 

"15 (ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1. .30 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11; 
TGGCCCGGGG AACCAGTCTA TTTCCATGGG 
20 (2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS; single 
^ 5 (D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
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(vii) IMMEDIATE SOURCE: 

(B) CLONE: TEVNIA.C3b 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1..3 7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
GCGAATTCAA GGCCTCCCAT GGGAGTACAC CAATTCA 
(2) INFORMATION FOR SEQ ID NO: 13: 

SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 28 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

15 (vii) IMMEDIATE SOURCE : 

(B) CLONE: TMVCP.51 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 1. .28 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 



(i) 



10 



AAAGGCCTTC TTACAGTATC ACTACTCC 



28 
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(2) INFORMATION FOR SEQ ID NO:14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

( B ) CLONE : TMVCP . 3 1 



(A) NAME /KEY • CDS 

(B) LOCATIOK „. .29 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
AGGCCCGGGA GTTGCAGGAC CAGAGGTCC 
(2) INFORMATION FOR SEQ ID NO: 15; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: SMVCP.N1 

( ix ) FEATURE : 



10 



(ix) 



FEATURE : 



25 



(A) NAME/KEY: CDS 

(B) LOCATION: 1. .24 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 



AAAGGCCTTC AGGCAAGGAG AAGG 



24 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic; 



10 



(vii) IMMEDIATE SOURCE: 

(B) CLONE: SMVCP . C2 



(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1. .26 



15 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:16 



AGGCCCGGGC TGCGGTGGGC CCATGC 



26 



(2) INFORMATION FOR SEQ ID NO: 17: 



20 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



25 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: GUS.N2 
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(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1 . .25 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
5 AAAGGCCTGT AGAAACCCCA ACCCG 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

0 (C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE : DNA (genomic) 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: GUS . CI 

15 <ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1. .29 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
CGGAATTCTC ATTGTTTGCC TCCCTGCTG 
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Cl .ATMS 



1 . An expression cassette comprising: 
a nucleotide sequence encoding: 

a) the nuclear inclusion (NIa) protease from tobacco etch virus; 

b) multiple restriction endonuclease sites; and 

c) self-cleavage sites for the protease, wherein the self-cleavage sites flank 
the protease and each restriction site, except at the termini of the 
nucleotide sequence. 



expression cassette vector comprising: 
a nucleotide sequence encoding: 

the nuclear inclusion (NIa) protease from tobacco etch virus; 
multiple restriction endonuclease sites; 

self-cleavage sites for the protease, wherein the self-cleavage sites flank 
the protease and each restriction site, except at the termini of the 
nucleotide sequence; and 

expression control elements operably linked to the nucleotide sequence. 

expression cassette vector comprising: 
a nucleotide sequence encoding: 
the nuclear inclusion protein (NIa) from tobacco etch virus flanked by 
self-cleavage sequences therefor; and 

restriction endonuclease sites flanking the self-cleavage sequences; and 
expression control elements operably linked to the nucleotide sequence. 



3. An 

a) 



5 

b) 
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4. The vector of claim 2 wherein the nucleotide sequence further comprises: 

a) an N-terminal start codon; and 

b) a C-terminal stop codon. 

5. The vector of claim 2 wherein at least one of the cleavage sequences encodes the 
amino acid sequence Sequence ID No. 1, wherein X is any amino acid. 

6. The vector of claim 3 wherein at least one of the cleavage sequences encodes the 
amino acid sequence Sequence ID No. 2, wherein X is any amino acid. 

7. The vector of claim 6 wherein the nucleotide sequence further comprises upstream 
of the open reading frames therein the 5' non-translated region from TEV RNA. 

8. The vector of claim 2 wherein the N-terminus cleavage sequence encodes the 
amino acid sequence Sequence ID No. 4. 



9. 



The vector of claim 8 wherein the C-terminus cleavage sequence encodes the 
amino acid sequence Sequence ID No. 5. 



1 0. The vector of claim 2 wherein the restriction sites are blunt-ended. 

11. The vector of claim 2 wherein the restriction sites are unique. 

1 2. The cassette of claim 1 having the nucleotide sequence of Sequence ID No. 5. 

13. The vector of claim 2 wherein one of the restriction endonuclease sites is ; 
multiple restriction site. 
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14. The vector of claim 2 or 3 wherein a nucleotide sequence encoding a heterologous 
protein is inserted into each restriction endonuclease site. 

15. An expression cell comprising the vector of claim 2 . 

16. An expression cell comprising the vector of claim 3 . 

17. The expression cell of claim 1 5 wherein the cell is a plant cell. 

18. The expression cell of claim 1 5 wherein the cell is a prokaryotic cell. 

1 9. A method for obtaining heterogeneous peptides in equimolar amounts comprising: 

a) cleaving two or more the restriction endonuclease sites with enzymes 
specific therefor; 

b) inserting DNA encoding a heterogeneous peptide into each cleaved 
restriction site; 

c) transfecting a suitable cell with the vector; 

d) culturing the transformed cell; and 

e) obtaining the heterogeneous peptides in equimolar amounts. 

20. The method of claim 1 9 wherein the cell is a plant cell. 



21. 



The method of claim 20 wherein the plant cell is a plant protoplast and the 
culturing is in vitro. 



22. The method of claim 1 9 wherein the cell is in a leaf of a plant and the culturing 
is in vivo. 
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23 . The method of claim 1 9 wherein the cell is a prokaryote. 

24. The vector of claim 2 or 3 wherein the promoter is the T7 polymerase promoter 
and the vector is derived from the infectious cDNA clone of TMV. 

25. A plant cell infected with the vector of claim 24. 
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RECOGNITION SEOUSNCE 
Sm t-? 

I 1 ' 1 

Mel Pro Gly Glu Pro Val Tyr Phe Gin Gly 

ATGCCCGGGGAACCAGTCTATTTCCAAGGG 



RECOGNITION SEOUENCE 
^ St Ec 

r < — i i i 

Glu Leu Val Tyr Ser Gin Gly Arg Pro STOP 
GAATTGGTGTACTCGCAAGGGAGGCCTTGAATTC 




TEV - NIa 



i 

Ba 



Sa 



Sm: 



Pro-X- 



PROTEIN \- X - Gly - Glu - Pro - Val - Tyr - Phe - Gin 



St: Gly - Arg - X - \ PROTEIN \ - X - Pro 
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Thursday, November 18, V 2 11 PM 
PR01 Map2 (1 > 1488) Site and Sequence 

Enzymes : 49 of 207 enzymes (Filtered) 

Sett ^9 s: Linear, Certain & Uncertain Sites, Standard Genetic Code 

is 

t{ o c o ^ _ 
o o co co co o) 
LU LU CD CO CO GD 

GAGCTCAGATCTAA ATAACAAATCTCAACACAACATATACAAAACAAACGAATCTCAAGCAAT C AAGCAT 

ctcgagtctag atttattgtttagagttgIgtt gtatatgttttgtttgcttagagttcgttagItcgta 



Page 1 



« E 

CO CO 

mm 



*- 70 



CO 



o 



tctacttctattgc agcaatttaaatcatttcttttaaagcaaaagcaattttctgaaaatt ttcaccat 
agatgaagataacgtcgttaaatttag taaagaaaatttcgtttIcgttaaaagactttIaaaagt^ta 14,0 

™™^™™"™^™" > ^^— TEV 5' ' ^— — 



(0 



"CO 



Q 
c 

X 



T TACGAACGATAGCCATGCCCGGGGAACCAGTCTATTTCCAAGGGAAGAAGAATCAGA AGCAr/VAnrTTA 

5 Leader.fl I Cleav age Seq. | TEV - Nii 

Het Pr ° Gly Glu Pro Vol Tyr Phe Gin Gly Lys Lys Asn Gin Lys His Lys Leu 



CO 



AGATGAGAGAGGCGCGTGGGGC 



o CO 

TAGAGGGCAATATGAGGTTGCAGCGGACGCAGGGGC 



CD 

to 
X 



TCTACTCTCTCCGCGCACCCCGATCTCCCGTTATACTCCAACGTCGCrTRrnTrrr, 



GCTAGAACATTA 



■+- 



+ 280 



Lys Met Arg G.u A.o Arg Gly A.o Arg Gly Gin Tyr Glu Vo, A,o Alo Asp Ala Gly Ala Leu G.u His Tyr 



Figure 18 
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Thursday, November 18, 199. _I1 PM p age 2 

PRQ1 Map2 (1 > 1488) Site and Sequence 

CTTTGGAAGCGCATATAATAACAAAGGAAAGCGCAAGGGCACCACGAGAGGAATGGGTGCAAAGTCTCGG 
1 1 1 i 1 h " 1 i I — 1 1 1 1- 350 

GAAACCTTCGCGTATATTATTGTTTCCTTTCGCGTTCCCGTGGTGCTCTCCTTACCCACGTTTCAG AGCC 

TEV-NIa 

Phe Gly Ser Ala Tyr Asn Asn Lys Gly Lys Arg Lys Gly Thr Thr Arg Gly Met Gly Ala Lys Ser Arg 

<o _ X 

S = E 

03 
CD 



1 1 



AAATTCATAAACATGTATGGGTTTGATCCAACTGATTTTTCATACATTAGGTTTGTGGATCCATTGACAG 
1 1 1 1 1 — 1 1 1 . 1 1 — 1 , r 

TTTAAGTATTTGTACATACCCAAACTAGGTTGACTAAAAAGTATGTAATCCAAACACCTAGGTAACTGTC 

TEV - Nla 

Lys Phe lie Asn Met Tyr Gly Phe Asp Pro Thr Asp Phe Ser Tyr lie Arg Phe Vol Asp Pro Leu Thr 

o O) 

tn cd 

GTCACACTATTGATGAGTCCACAAACGCACCTATTGATTTAGTGCAGCATGAGTTTGGAAAGGTTAGAAC 
' 1 " 1 ■ — 1 1 1 1 — ■ 1 1 1 1 (- qgo 

CAGTGTGATAACTACTCAGGTGTTTGCGTGGATAACTAAATCACGTCGTACTCAAACCTTTCCAATCTTG 

TEV • Nla 

Gly His Thr lie Asp Glu Ser Thr Asn Ala Pro He Asp Leu Vol Gin His Glu Phe Gly Lys Val Arg Thr 

ACGCATGTTAATTGACGATGAGATAGAGCCTCAAAGTCTTAGCACCCACACCACAATCCATGCTTATTTG 
' 1 ' 1 1 1 1 H 1 1- 1 1 , (- 5 6 o 

TGCGTACAATTAACTGCTACTCTATCTCGGAGTTTCAGAATCGTGGGTGTGGTGTTAGGTACG AATAAAC 

TEV - Nla 

Arg Met Leu He Asp Asp Glu He Glu Pro Gin Ser Leu Ser Thr His Thr Thr He His Ala Tyr Leu 

GTGAATAGTGGCACGAAGAAAGTTCTTAAGGTTGATTTAACACCACACTCGTCGCTACGTGCGAGTGAGA 
1 1 1 1 1 1 1 1 1 1 , 1 , b 630 

CACTTATCACCGTGCTTCTTTCAAGAATTCCAACTAAATTGTGGTGTGAGCAGCGATGCACGCTCA CTCT 

TEV - Nla 

Val Asn Ser Gly Thr Lys Lys Val Leu Lys Val Asp Leu Thr Pro His Ser Ser Leu Arg Ala Ser Glu 
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Thursday, November 18, H ) 1PM Page 3 

PRQ1 Map2 (1 > 1488) Site and Sequence 

o 
O 

AATCAACAGCAATAATGGGATTTCCTGAAAGGGAGAATGAATTGCGTCAAACCGGCATGGCAGTGCCAGT 
' 1 1 f 1 1 ■ ■« H 1 1 1 1 1 . | yon 

TTAGTTGTCGTTATTACCCTAAAGGACTTTCCCTCTTACTTAACGCAGTTTGGCCGTACCGTCA CGGTCA 

TEV - Nla 

Lys Ser Thr Ala He Met Gly Phe Pro Glu Arg Glu Asn Glu Leu Arg Gin Thr Gly Met Ala Vol Pro 



Val 



LLI 



CO 2E ^ 



GGCTTATGATCAATTGCCACCAAAGAGTG AGGACTTGACGTTTGAAGGAGAAAGCTTGTTTAAGGGACCA 

" 1 1 1 1 • « 1 1 1 1 1— 1 l. 77/-1 

CCGAATACTAGTTAACGGTGGTTTCTCACTCCTGAACTGCAAACTTCCTCTTTCGAACAAAT TCCCTGGT 

TEV - Nla 

Ala Tyr Asp Gin Leu Pro Pro Lys Ser Glu Asp Leu Thr Phe Glu Gly Glu Ser Leu Phe Lys Gly Pro 
E 8 o 

CGTGATTACAACCCGATATCGAGCACCA TTTGTCACTTGACGAATGAATCTGATGGGCACACAACATCGT 

' 1 1 1 ' 1 ' 1 1 ■ — ' 1 1 f_ i i_ ann 

GCACTAATGTTGGGCTATAGCTCGTGGTAAACAGTGAACTGCTTACTTAGACTACCCG TGTGTTGTAGCA 

TEV • Nla 

Arg Asp Tyr Asn Pro He Ser Ser Thr He Cys His Leu Thr Asn Glu Ser Asp Gly His Thr Thr Ser 

TGTATGGTATTGGATTTGGTCCCTTCATCATTACAAACAAGCACTTGTTTAGAAGAAATA ATGGAACACT 

ACATACCATAACCTAAACCAGGGAAGTAGTAATGTTTGTTCGTGAACAAATCTTCTTTATT ACCTTGTGA 9 '° 

TEV - Nla 

Leu Tyr Gly He Gly Phe Gly Pro Phe lie He Thr Asn Lys His Leu Phe Arg Arg Asn Asn Gly Thr Leu 
GTTGGTCCAATC ACTACATGGTGTAT TC AAGGTCAAGAACACCACGACTTTGCAACAACACCTCATTGAT 

' ' ' 1 ' 1 ' 1 ' 1 • 1- 1 h QQO 

CAACCAGGTTAGTGATGTACCACATAAGTTCCAGTTCTTGTGGTGCTGAAACGTTGTTG TGGAGTAACTA 

TEV • Nla 

Leu Vol Gin Ser Leu His Gly Vol Phe Lys Vol Lys Asn Thr Thr Thr Leu Gin Gin His Leu He Asp 



BNSDOCID: <WO 9521249A1_I_> 



WO 95/21249 ^fcPCT/US95/01495 

5/10 



Thursday, November 18, 1T 1 1 PM Page 4 
PRQ1 Map2 (1 > 1488) Site and Sequence 

— z 

CL >< 

co O 



GGGAGGGACATGATAATTATTCGCATGCCTAAGGATTTCCCACCATTTCCTCAAAAGCTGAAATTTAGAG 

1 1 1 1 1 1 1 1 , j 1 ( , h 105 Q 

CCCTCCCTGTACTATTAATAAGCGTACGGATTCCTAAAGGGTGGTAAAGGAGTTTTCGACTTTAAATCTC 

TEV ■ Nla 

G!y Arg Asp Met He He He Arg Met Pro Lys Asp Phe Pro Pro Phe Pro Gin Lys Leu Lys Phe Arg 

-a 



AGCCACAAAGGGAAGAGCGCATATGTCTTGTGACAACCAACTTCCAAACTAAGAGCATGTCTAGCATGGT 
1 1 1 1 1 | 1 1 . 1 t I — — i h 1 120 

TCGGTGTTTCCCTTCTCGCGTATACAGAACACTGTTGGTTGAAGGTTTGATTCTCGTACAGATCGTACCA 

TEV - Nla 



Glu Pro Gin Arg Glu Glu Arg lie Cys Leu Vol Thr Thr Asn Phe Gin Thr Lys Ser Met Ser Ser Met Vol 



o 

CL 
CO 



GTCAGACACTAGTTGCACATTCCCTTC ATCTGATGGCATATTCTGGAAGCATTGGATTCAAACCAAGGAT 
1 1 1 1 1 1 — 1 1 I 1 1 • h 1 190 

CAGTCTGTGATCAACGTGTAAGGGAAGTAGACTACCGTATAAGACCTTCGTAACCTAAGTTTGGTTCCTA 

TEV - Nla 

Ser Asp Thr Ser Cys Thr Phe Pro Ser Ser Asp Gly lie Phe Trp Lys His Trp He Gin Thr Lys Asp 

PI 
o 

-5C CO *5 

o c 3 co £ 

LL. CO <CD CO 



0 < CD 

ll/ 



GGGCAGTGTGGCAGTCCATTAGTATCAACTAGAGATGGGTTCATTGTTGGT ATACACTCAGCATCGAATT 
1 ■ — | 1 H 1 1 1 1 1 1 1 1 , f- 126O 

CCCGTCACACCGTCAGGTAATCATAGTTGATCTCTACCCAAGTAACAACCATATGTGAGTCGTAGCTTAA 

TEV - Nla 

Gly Gin Cys Gly Ser Pro Leu Vol Ser Thr Arg Asp Gly Phe lie Vol Gly He His Ser Ala Ser Asn 
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Q 
c 

X 

TCACCAACACAAACAATTATTTCACAAGCGTGCCGAAAAACTTCATGGAATTGTTGAC AAATCAGGAGGC 

AGTGGTTGTGTTTGTTAATAAAGTGTTCGCACGGCTTTTTGAAGTACCTTAACA ACTGTTTAGTCcTfTr^ ,33 ° 

TEV - Nla. " 

Phe Thr Asn Thr Asn Asn Tyr Phe Thr Ser Vol Pro Lys Asn Phe Met Glu Leu Leu Thr Asn Gin Glu Ala 

o 

ULi 

GCAGCAGTGGGTTAGTGGTTGGCGATTAAATGCTGACTCAGTATTGTGGGGGGGCCATAAAGTTTTCATG 

cgtcgtcacccaatcaccaaccgctaatttacgactgagIcataacacccccccggtatItcaaaagtac 1400 



TEV - Nla 

Gin Gin Trp Vol Ser Gly Trp Arg Leu Asn Alo Asp Ser Vol Leu Trp Gly Gly His Lys Vol Phe Met 



o <£ ~~ 

CD UJ ^ to 

I I occ 



AGCAAACCTGAAGAGCCTTTTCAGCCAGTTAAGGAAGCGACTCAA^ 

TCGTTTGGACTTCTCGGAAAAGTCGGTCA ATTCCTTCGCTGAGTTGAGTACTCA r TTAArr a H7 ° 

TEV - Nla | C)eavage , 

Ser Lys Pro Glu Glu Pro Phe Gin Pro Vol Lys Glu Ala Thr Gin Leu Met Ser Glu Leu Val Tyr Ser 

Si 8 

v I 

AAGGGAGGCCTTGAATTC 

, 1 1 - f £488 

TTCCCTCCGGA ACTTAAG 
|Seq.| | 

Gin Gly Arg Pro • 
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